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A three-dimensional (3D) imager with a single-pixel detector and complementary intensity modulation of 
a digital micromirror device (DMD) array, which does not rely on scene raster scanning as in light 
detection and ranging (LIDAR) or on a two-dimensional array of sensors as used in time-of-flight 
(TOF) cameras, can not only capture full-color, high-quality images of real-life objects, but also recover 
the depth information and 3D reflectivity of the scene, reducing the required measurement dimension as 
well as the complexity, and cutting the cost of the detector array down to a single unit. The imager 
achieves spatial resolution using compressed sensing to exploit the sparsity of the signal. The disparity 
maps of the scene are reconstructed using sum of absolute or squared differences to reveal the depth 
information. This nonscanning, low-complexity 3D reflectivity imaging prototype may be of considerable 
value to various computer vision applications. © 2015 Optical Society of America 

OCIS codes: (110.3010) Image reconstruction techniques; (110.2990) Image formation theory; 


(110.1650) Coherence imaging; (200.4740) Optical processing. 
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1. Introduction 


Acquiring three-dimensional (3D) structure and re- 
flectivity with an active imager has many applica- 
tions. Active range acquisition systems such as 
light detection and ranging (LIDAR) and time-of- 
flight (TOF) cameras obtain the range information 
from a single viewpoint by measuring the time differ- 
ence of arrival between a transmitted pulse and the 
scene reflection. In the former, transverse resolution 
is obtained by single-pixel devices via raster scan- 
ning [1-3]. The latter typically replaces scanning 
with spatially resolving detectors to acquire the 
depth map [4,5]. However, LIDAR cameras are lim- 
ited by the scanning time, and TOF cameras also 
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have limitation in high-resolution array fabrication. 
Moreover, in both cameras, single pixel or individual 
pixels in the sensor array are very small and the op- 
tical flux must be distributed across the entire array, 
so the shot noise is significant for each pixel. Thus, 
we can say that they achieve high depth resolution 
but suffer from poor spatial resolution. 

The solution is to remove the need for a spatially 
resolving detector as well as the scanning, and com- 
putational imaging based on random patterns is a 
good alternative method which recovers the image 
of the object by correlating the known spatial distri- 
bution of a changing speckle pattern with the total 
reflected (or transmitted) light intensity. This tech- 
nique is called computational ghost imaging (GI) 
[6,7], a variant of GI. Initially, ghost images were 
generated from two correlated light fields and two 
detectors: a bucket (single-pixel) detector without 
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spatial resolution is positioned in the signal arm to 
collect the total light intensity coming from an object, 
while a spatial-resolution detector in the reference 
arm is used to record the light field which had not 
interacted with the object. Its first demonstration 
[8,9] used coincidence counts of signal-idler biphoton 
pairs, and then it was proved that GI is also achiev- 
able with pseudothermal [10-12] or true thermal 
light [13], inspiring much attention and applications 
such as optical encryption [14,15], correspondence 
imaging [16,17], lensless GI with sunlight [18], GI 
through turbulent atmosphere [19], or adaptive GI 
[20]. In order to dramatically improve the quality of 
ghost images, Katz et al. [21] experimentally demon- 
strated that, by utilizing compressed sensing (CS) 
[22-24], algorithms could get a better performance 
with far fewer measurements than conventional GI. 
This is based on the single-pixel camera scheme pro- 
posed by Baraniuk et al. [25], where a bucket (single- 
pixel) detector is located on the focal plane to collect 
the total light intensity. We found that only the signal 
is magnified dramatically while the shot noise level 
is kept unchanged, and thus this protocol has the 
maximum flux and signal-to-noise ratio, as well as 
sensitivity. If we changed the detector with a single- 
photon single-pixel detector, it will make ultraweak 
light detection possible. In addition, with the help of 
CS algorithms, images can be recovered accurately 
from far fewer measurements than what is usually 
considered necessary. Therefore, we use the single- 
pixel camera here instead of standard cameras. 

On another note, based on this single-pixel camera 
scheme, Howland et al. [26] proposed a laser radar 
system for 3D imaging where transverse spatial res- 
olution is obtained through compressive sampling 
without scanning or array detection. Subsequently, 
this protocol was improved by using parametric sig- 
nal modeling to recover the set of distinct depth 
ranges present in the scene [27]. Recently, Sun et al. 
[28] demonstrated that the 3D spatial form of an ob- 
ject can also be captured by comparing the shading 
information in the images, which are derived from 
several single-pixel detectors in different locations. 
In their scheme, they measured the differential sig- 
nals of the complementary illuminated pairs, actually 
normalizing the bucket signals with respect to the 
positive or negative intensity fluctuations averaged 
to 0, and then correlated them with noninverted pat- 
terns to reconstruct a ghost image. Their method was 
essentially the same as that of correspondence imag- 
ing [16,17], but their GI bucket values were used as a 
series of weighting factors instead of just 0 and 1. 

Recently we have developed a novel technique, 
which we call complementary compressive imaging, 
that makes full use of both complementary reflec- 
tions and uses two single-pixel detectors to dramati- 
cally improve the image quality [29]. In this work, we 
have extended this technique to the imaging of 3D 
reflectivity using only a single-pixel detector, but 
without time-correlated module and scanning com- 
ponents. The spatial resolution is generated by 
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random patterns and their inverse ones on a digital 
micromirror device (DMD). The object and the light 
source are fixed onto the same rotating platform to 
ensure that the object is always illuminated from 
only one direction. Rotating the platform by a small 
angle, two images are reconstructed by a CS algo- 
rithm as if they are taken from the different view- 
points. Using a binocular stereo algorithm allows 
the surface gradient and hence the 3D reflectivity 
to be reconstructed. 


2. Experimental Setup and 2D Imaging Results 


Our setup (Fig. 1) consists of a light-emitting diode 
(LED) lamp illuminating a 3D target; a rotating plat- 
form placed with the source and the target; a rotating 
set composed of a red, green, and blue filter; a DMD 
to perform light intensity modulation with computer- 
generated random one-to-one complementary binary 
pattern pairs; a single-pixel detector to measure the 
total intensity of the back-reflected light; an imaging 
lens; a collecting lens; and a computer to generate the 
random complementary patterns as well as perform 
3D reconstructions of the target. 

The platform ensures that the LED lamp illumi- 
nates a target in only one orientation. The object 
is positioned at the rotational axis of the platform, 
which is about 26 cm away from the working plane 
of the DMD. The back-reflected optical flux of the ob- 
ject is transmitted through a rotating filter set 
and focused onto the DMD, which consists of 768 x 
1024 micromirrors, each of size 13.68 x 13.68 um?, 
via the imaging lens. Then though a collecting lens, 
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Fig. 1. Experimental setup used for 3D complementary compres- 
sive reflectivity imaging. A rotating platform placed with a LED 
lamp and a target turns a small angle such that two images are 
derived from the same single-pixel detector but appear as if they 
are taken from the different viewpoints, like a binocular active vi- 
sion. For each rotation position, the same computer-generated ran- 
dom one-to-one complementary binary pattern pairs are encoded 
on the DMD. 


the light is converged onto one spot. A photomulti- 
plier tube (PMT) (Hamamatsu H7468-20) is used 
here as the bucket (single-pixel) detector for collect- 
ing and recording the total light intensity reflected 
from the DMD for every pattern. The integration 
time of PMT is set to 900 us, which is shorter than 
each flip time interval of the micromirrors, and the 
dead time is set to 200 ps. The bias voltage of the 
PMT is 400 V. Finally, the measured signals are fed 
to a computer algorithm to reconstruct an image for 
each rotation position of the platform and then re- 
cover 3D reflectivity. 

Each mirror on the DMD rotates about a hinge and 
can be shifted between two positions oriented at 
+12° or -12° with respect to the DMD surface, where 
micromirrors at +12° appear as bright pixels 1 
and, and inversely, -12° as dark pixels 0. Note that 
the large operational bandwidth of the DMD (270- 
1800 nm) makes this technique suitable for imaging 
at the visible light or near-infrared light wave- 
lengths. We set the modulation patterns of the DMD 
in a random binary distribution, with a black-to- 
white ratio close to 1:1. By changing between a 
binary pattern and its complementary one in turn, 
we modulate the light intensity at a frequency of 
450 Hz. For the reason that each pattern has nearly 
equal numbers of black and white pixels, the differ- 
ence matrix of each complementary pattern pair is a 
randomly distributed binary matrix taking on two 
values 1 and -1, and the corresponding complemen- 
tary differential bucket signal is normalized with a 
mean approximately 0. This has been shown to im- 
prove the image quality of the 2D reconstruction [29]. 

The principle behind the design of CS imaging sys- 
tems can be summarized in the equation 


y=Axte, (1) 


where y is a M x 1 column vector of linear measure- 
ments; the measurement matrix A € RYN contains 
M row vectors which are reshaped from the patterns 
fed onto the DMD; x is an original image of interest 
with p xq pixels ordered in a N x1 vector, where 
N =p xq;ande ofsize M x 1 denotes the noise. When 
M < N, this becomes an ill-conditioned problem with 
infinite solutions. However, in most instances, natu- 
ral signals have a sparse representation in a certain 
basis ¥ (e.g., Haar wavelet, discrete cosine transform, 
Fourier transform, or noiselet transform basis [30]). 
According to this prior knowledge, CS theory asserts 
that a small collection ofnonadaptive linear measure- 
ments of a compressible signal is sufficient for perfect 
recovery provided the measurement matrix and 
representation basis are incoherent, enabling sub- 
Nyquist measurement. Here, we expand x on an 
orthogonal basis ¥ = [yy, wo, ..., wy] as 


N 
x= Yx, or x= So xi. (2) 
i=1 


where x e R™*! is the coefficient sequence of x in the 
expansion. We say that x’ is k-sparse if, at most, k of 


coefficients in x’ are nonzero. An empirical fact is that 
most images are well approximated by k-sparse ex- 
pansions with k much less than the number of pixels 
N, and this is the reason why data compression is ef- 
fective. For incoherent pairs, we only need on the or- 
der of k log(N/k) random samples. Then the problem 
becomes 


y = AVx' +e. (3) 


Let y' and A’ denote the bucket signal and the mea- 
surement matrix, respectively. Since each comple- 
mentary frame pair here appears alternately, the 
complementary differential bucket signal is defined 
as y = Yo ~ Yp41» 0 = 1,3,5,.... Accordingly, comple- 
mentary differential frames can be written as A = 
A, —Aj,1, where inverse frames A’, , = mxn - 4o, 
lmxy stand for an array of all 1. Then Eq. (3) can 
be rewritten as 


Yo -Yo41 = (Ay -Ap ) Px + (e1 - e2). (4) 


Further, the accurate recovery is achieved by solv- 
ing a tractable convex optimization program [24]. 
Recent research [31] has proved that the use of total 
variation (TV) regularization instead of the /, term in 
CS problems gives a sharper recovered image by pre- 
serving the edges or boundaries more accurately, 
and the gradient of an image is generally sparse 
as well. For reconstructing an image, a solver named 
TVAL3 [31] is applied to this TV-based minimization 
model: 


min ) IDa + 5 lly — Axl}, ©) 


where D;x is the discrete gradient vector of x at posi- 
tion i, D is the gradient operator, };||Dix]|ı is the dis- 
crete TV of x, „u is a constant scalar used to balance 
these two terms, and ||--- ||; stands for /,; norm, de- 
fined as ||x||; = X4] |x;|'. The first term is small 
when D;x is sparse. The second term is small when 
the optimal x is consistent with Eq. (3) within a 
small error. 

By changing the color filters, we obtain red, green, 
and blue component images for each rotation posi- 
tion of the platform via CS, as shown in Figs. 2(a)- 
2(c) and 2(e)-2(g). Then the colored images [see 
Figs. 2(d) and 2(h)] are recovered by synthesizing 
the three reconstructed components using multiple 
grayscale encoding. In order to acquire a 2D image 
of size 128x128 pixels, it would take 128? = 
16384 measurements as in LIDAR or 16,384 sensors 
in a TOF camera. However, in this framework, only 
9856 measurements (or patterns) in total, about 
60% of the total number of pixels, are used for 
high-quality 2D image reconstruction. In compari- 
son, LIDAR and TOF cameras all suffer from poor 
spatial resolution and do not use the sparsity inher- 
ent in 2D imaging to achieve savings in number of 
sensors or scanning pixels. The imaging time of 
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(a) (b) (c) (d) 

(e) (f) (g) (h) 
Fig. 2. Color reconstructions of a 3D target for each rotation po- 
sition of the platform. (a)(c) and (e)-(g) correspond to red, green, 
and blue color channels of the 2D image in the first and second 
positions, respectively. (d) and (h) are full-color reconstructions 
obtained by combining the three separate color components 
(a)(c) and (e)-(g), respectively. The patterns utilized group clus- 
ters of 6 x 6 mirrors, so the size of one “pixel” (minimum resolution) 
is 82.08 x 82.08 um?. Since the imaging region covers 768 x 768 


mirrors in the center of the DMD, we will finally obtain a 2D image 
of size 128 x 128 “pixels” with 255 grayscales. 


our system was 9856/450 = 21.9 s. In the future, the 
modulation frequency of the DMD can reach 
32.5 kHz, and the detector can be replaced with an 
APD or counter-type PMT of GHz, so the imaging 
time of our system can be greatly improved. In addi- 
tion, applying the single-pixel camera scheme has 
the great advantages of high signal-to-noise ratio 
and high sensitivity in large flux subsampling with 
only a single pixel, compared with standard cameras, 
although it may sacrifice some imaging time with the 
use of the DMD. We also find that both LIDAR and 
TOF cameras need a much longer integration time 
than our imaging system, for the reason that the op- 
tical flux must be distributed across the entire array 
on the imaging plane. If the total number of image 
pixels is large enough, the imaging time of our sys- 
tem will be comparable with that of the standard 
cameras. 


3. 3D Reconstructions Using Binocular Stereo Vision 


Because the object is fixed in the center of the plat- 
form, and the LED lamp is suspended in midair but 
with a support pole fixed onto the edge of the plat- 
form, the apparent lighting of the object depends 
on the illumination orientation of the light source. 
However, the images [Figs. 2(d) and 2(h)] are derived 
from the different rotation positions of the platform, 
and the intensity distribution of these two images is 
different. Normally, depth information of a scene is 
lost in a 2D image, but there is a stereo vision tech- 
nique, named binocular stereo vision, which can 
extract depth information from two images taken 
from different viewpoints just similar to our eyes. 
Stereo vision algorithms can be roughly divided into 
feature-based and area-based algorithms. Feature- 
based algorithms are based on epipolar geometry, 
where each object point in one of the stereo images 
can be found on a specific line, called the epipolar 
line, in the other image. They use characteristics 
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Fig. 3. 3D reconstructions using binocular stereo vision: 
disparity maps retrieved by (a) SAD-based and (b) SSD-based 
stereo-matching algorithms. 


in the images such as edges or corners, comparing 
their similarities to reveal the depth of the scene 
and then to build the disparity map computing the 
displacement between those features. Area-based al- 
gorithms match blocks of pixels to find correspond- 
ences in the images. Common methods for the 
matching in area-based algorithms aggregate the 
sum of absolute or squared differences (SAD/SSD), 
over a window. These methods can be implemented 
efficiently using filters, e.g., Laplacian of Gaussian 
(LoG) and mean filters, for removing noise and 
changes in bias. There also exist various other algo- 
rithms for area-based matching, but most of them are 
computationally too expensive. Given that SAD and 
SSD are easy to implement and use less computing 
power, we use them here for 3D reconstructions. SAD 
and SSD are defined as 


L L 
SAD = > >D Mrtictj —Iptiejl (6) 
i=-Lj=-L 
L L 
SSD = > >» Wrtietj -Djieyl (7) 
i=-Lj=-L 


where L = (s — 1)/2, while J is the primary and I’ is 
the secondary image being matched against each 
other, having r, c or r’, c’ respectively as the center 
coordinates of the current block. We perform the 
matching using the image of the first rotation posi- 
tion as the primary one. The block size s affects 
the quality of the disparity map. 

After enlarging 2D images shown in Fig. 2 fourfold 
on their scales to get more computable pixels, we cal- 
culated the SAD and SSD with a window size of 9 x 9 
pixels, and finally captured the depth images as illus- 
trated in Fig. 3. As defined in Eqs. (6) and (7), the 
SSD algorithm performs only a little better than 
the SAD algorithm, with a difference of the square 
operation. Thus, the performance of these two algo- 
rithms as shown in Fig. 3 is similar. 


4. Conclusion 

In conclusion, we have experimentally demonstrated 
a 3D compressive reflectivity imaging system with 
only a single-pixel detector and complementary 


intensity modulation performed by a DMD. The 
system uses a convex optimization algorithm to re- 
construct full-color 2D images with the sparsest 
coefficients represented in some basis followed by 
calculating SAD/SSD to reveal the spatial structure 
of the depth maps. Unlike LIDAR and TOF cameras, 
it uses the sparsity inherent in a 2D image to achieve 
savings in number of scanning measurements as in 
LIDAR or sensors in TOF, but with high-quality 
spatial resolution. The use of a rotating platform 
positioned with a LED lamp in the edge and a 3D tar- 
get in the center ensures that the LED lamp illumi- 
nates a target in a fixed direction. By rotating the 
platform, we will acquire two full-color images from 
different viewpoints with different shading distribu- 
tion. Utilizing binocular stereo vision makes it fea- 
sible to extract depth information. An important 
difference between our technique and LIDAR/TOF 
cameras is that only a single detector is used to re- 
solve spatial resolution and recover the disparity 
maps of the object, removing the need of scanning 
or array detector and reducing the required measure- 
ment dimension, complexity, and cost, something 
that can be crucial for various computer vision 
applications. 
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